Видео с ютуба Preference Optimization
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
Согласование LLM с прямой оптимизацией предпочтений
ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
Direct Preference Optimization: Forget RLHF (PPO)
SPO: Self-Play Preference Optimization
Direct Preference Optimization in One Minute
Прямая оптимизация предпочтений (DPO) за 1 час
Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained
Unlocking Language Models: Direct Preference Optimization
[2024 Best AI Paper] Self-Play Preference Optimization for Language Model Alignment
Direct Preference Optimization (DPO)
Hanjun Dai: Preference Optimization for Large Language Models
DISL Review: Filtered Direct Preference Optimization
Contrastive Preference Optimization Explained
Quanquan Gu - Self-Play Preference Optimization for Language Model Alignment
DPO : Direct Preference Optimization
Direct Preference Optimization (DPO): упрощение обучения ИИ на человеческих предпочтениях
Consumer Optimization